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Abstract 

We believe that discontinuous linear information is never more powerful than con- 
tinuous linear information for approximating continuous operators. We prove such a 
result in the worst case setting. In the randomized setting we consider compact linear 
operators defined between Hilbert spaces. In this case, the use of discontinuous linear 
information in the randomized setting cannot be much more powerful than continuous 
linear information in the worst case setting. These results can be applied when function 
evaluations are used even if function values are defined only almost everywhere. 

1 Introduction 

We study the approximation of an operator S defined between normed spaces F and G. The 
operator S does not have to be linear or continuous. We approximate S(f) by algorithms 
that use information consisting of finitely many continuous or discontinuous linear functionals 
Li : F — y R. The error of such algorithms is defined either in the worst case or randomized 
setting. 

For continuous S, it is hard to imagine that one can learn about S(f) by using discon- 
tinuous information. On the other hand, it is well known that the Monte Carlo algorithm 
works nicely for multivariate integration defined for /^-functions. This algorithm uses linear 
functionals given by function evaluations which are indeed discontinuous or even not always 
well defined. Hence, discontinuous information is actually used in computational practice 
and seems to be useful, at least in the randomized setting. 
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This is the subject of this paper. We want to verify the power of discontinuous linear 
information and compare it to the power of continuous linear approximation. We study the 
worst case and randomized settings. This is done by comparing the nth minimal (worst case 
and randomized) errors which we can achieve by using n discontinuous or continuous linear 
functionals. 

In the worst case setting, we prove that as long as S is a continuous operator (not 
necessarily linear) then the nth minimal errors are exactly the same for the class A al1 of 
all discontinuous or continuous linear functionals and the class A al1 of all continuous linear 
functionals, see Theorem [TJ This means that the use of discontinuous linear functionals does 
not help. The situation is quite different if S is discontinuous. We present an easy example 
of a discontinuous linear functional S for which the nth minimal errors for the class A al1 are 
zero for all n > 1, whereas the nth minimal errors for the class A al1 are infinity for all n > 1. 

In the randomized setting, we mostly consider compact linear operators 5* defined between 
Hilbert spaces F and G. In this case, we know from [13] that the power of continuous linear 
functionals in the randomized setting is roughly the same as in the worst case setting, see 
Lemma |2j Here, the word "roughly" means that the nth minimal error in the randomized 
setting is at least as large as a half of the (4n — l)st minimal error in the worst case setting, 
and obviously it is at most as large as the nth minimal error in the worst case setting. 
By combining with the result from the worst case setting, we conclude that the power of 
discontinuous linear functionals in the randomized setting is roughly the same as the power 
of continuous linear functionals in the worst case setting. On the other hand, if we drop the 
assumption that S is a compact linear operator between Hilbert spaces then we can construct 
a problem S which is not solvable in the worst case setting and solvable and relatively easy 
in the randomized setting. Here, not solvable means that the nth minimal errors in the worst 
case setting do not converge to zero, and relatively easy means the nth minimal errors in 
the randomized setting are of order n -1 / 2 . 

For many applications the class F consists of functions and we can only use function 
evaluations for the approximation of S. The class of such evaluations is called standard and 
denoted by A std . These evaluations are always linear but not always continuous. That is, 
we always have A std C A al1 , and, depending on the space F, we sometimes have A std C A al1 . 
In either case, our results apply. In particular, if all function evaluations are discontinuous 
then they may be useless in the worst case setting since the minimal worst case error of 
any algorithm that uses n function values is as good as a constant algorithm that uses no 
function values, see Remark [2j 

For some applications the space F consists of equivalence classes of functions that are 
equal almost everywhere. This is the case for F\ = Li2(D) for some D C R d . Then func- 
tion evaluations are not even well defined. We extend our analysis also to such function 
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evaluations and show that again the same results as before hold. 

2 Worst Case Setting 

For arbitrary normed spaces F and G, consider an arbitrary operator S : F — >■ G that does 
not have to be linear or continuous. We approximate S(f) for / from the unit ball of F 
by algorithms that use finitely many linear functionals from A al1 or from A al1 , respectively. 
More precisely, we consider algorithms A n : F — > G given by 

A n (f) = MLi(f),L 2 (f ),..., L n (f)), (1) 

where n is a nonnegative integer, ip n : M n — > G is an arbitrary mapping, and Lj G A, where 
A G {A al1 , A al1 }. Hence, for A = A al1 we only use continuous linear functionals, whereas for 
A = A al1 we may also use discontinuous linear functionals. 

The choice of Lj can be nonadaptive or adaptive. It is nonadaptive if the functionals Lj 
are the same for all / G F, and it is adaptive if Lj depends on the already computed values 
L 1 (/),L 2 (/),...,L i _ 1 (/). That is, 

N(f) = (L 1 (f),L 2 (f ),..., L n (f)) 

is the information used by the algorithm A n and Lj = Lj( - ; Li(f), L 2 (f), . . . , G A. 

If the choice of all Lj's is independent of / G F then N is nonadaptive information, otherwise 
if at least one Lj varies with / G F then N is adaptive information. For n = 0, the mapping 
A n is a constant element of the space G. More details can be found in e.g., [T2], [151 EZ]- We 
define the error of such algorithms by taking the worst case setting, i.e., 

e(A n )= sup \\S(f) - A n (f)\\ 

II/IIf<i 

Observe that it is enough that the operator S is defined on the open unit ball in F, not 
necessarily on the whole space F. We take the open unit ball instead of the more standard 
case of the closed unit ball of F in the definition of the worst case error since this includes 
also operators with singularities on the boundary of the unit ball. For linear continuous S, 
or more generally for S uniformly continuous on the closed unit ball of F, this does not make 
a difference. 

We define the nth minimal errors of approximation of S in the worst case setting as 
follows. 
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Definition 1. For n = and n G N := {1, 2, . . . }, let 



, all— wor 



(S) = inf e(A0 

A n with LjeA a11 



and 



ef- WOT (S) = inf e(A n ). 



A„ with LjSA a11 



For n = 0, we obtain 

e all-wor (S) = eall-wor (S) = inf gup _ ^ 

sec ||/!| F <i 

It is easy to see that the best algorithm is A = if we assume that S(f) = —S(—f) for all 
||/||f<1. Then 

e all-wo I(5) = ~all-wo r(s) = sup || 5(/ )|| . 

II/IIf<i 

The error e a WOT (S) is the initial error that can be achieved without computing any linear 
functional on the elements / G F. Clearly, 

ef- wor {S) < ef- wor {S) for all neN. 

The sequences {e all_wor ^(5*)} and {e all_wor (S')} are both non- increasing but not necessarily 
convergent to zero. 

We will use the following fact from functional analysis, see, e.g., [TJ Ch. 3]. 

Lemma 1. Assume that F is a normed space and L e A al1 is discontinuous. Then for all 
real a the set {/ G F \ L(x) = a} is dense in F . 

We are ready to prove that discontinuous linear functionals do not help for the approxi- 
mation of continuous operators in the worst case setting. 

Theorem 1. Let F and G be normed spaces and let S : F — > G be continuous. Then 

all— wor / q\ ~all— wor / q\ 

e n W) — e n W)- 

Proof. We may assume that dim(F) = oo since otherwise A al1 = A al1 and there is nothing to 
prove. Consider arbitrary adaptive information = (Li, L 2 , . . . , L n ) with 

i 3 =i J ('i!/i,l/2,---,2/ 3 -i)eA a11 and yi = L i (f;y 1 ,y 2 , . . . for / G F. 



4 



It is well known that the infimum of the worst case errors of algorithms A n that use infor- 
mation N is given by the radius of information N, 

r(N) = sup rad({5(/) | N(f) = y, \\f\\ F < 1}), (2) 

yeN(F) 

where iad(A) = mi g€G sup aeA \\g — a\\ denotes the radius of a set A C G, see PT7) . Without 
loss of generality we may assume that the linear functionals L\, L2, . . . , L n are linearly inde- 
pendent since otherwise the choice of a linear functional, say, Lj that is linearly dependent 
on Li, L/2, ■ ■ ■ , Lj_i does not increase our knowledge about the element /. This implies that 
we may assume that N(F) = M n . 

For y G N(F) = W 1 and j = 1, 2, . . . , n, define 

B k = B k (y) = {feF\ L s (f)= yj for j = 1, 2, . . . , fc}, 

Then the are affine subspaces of F. 

For each affine subspace B of F, we associate the uniquely determined linear subspace 
B such that B = f + B for any f & B. It is easily seen that a linear functional on F is 
continuous on B if and only if it is continuous on B. In particular, 

Bk = Bk(y) = kerLi fl kerL 2 PI • • • D kerLfc, 

and continuity of on B k is equivalent to continuity of L k+ i on B^. 

We may now further assume for k = 1, ... ,n — 1 that the functional L k+X satisfies the 
following condition: 

either L k +i is continuous on F or L k +i is discontinuous on B k . 

Indeed, assume that L k+ i is continuous on B k . Then it is also continuous on B k . Let L be a 
continuous linear extension of L k+ \ on B k to the whole space F. Then L k+1 — L is a linear 
functional which is on B k , so that ker(L^+i — L) D B k . This implies that L k +i — L is in 
the span of L%, L 2 , . . . , L k , and for some numbers dj we have L(f) = L k +i(f) + ^2j=i a jLj(f) 
for all /. Hence knowing Lj(f) for j — 1, 2, . . . , k, we know iff we know Lk+i(f)- This 
means that we can replace the functional in the information N with the continuous 
functional L without essentially changing the information and without changing its radius. 
We now define the information N* = (L*, L\, . . . , L* ) with adaptively chosen 

L* = L*( S2 /t,?/ 2 V.., GA a11 and y* = L*(f;yt, y *, . . . for /gF 

such that r(N*) < r(N). Since iV is arbitrary and iV* consists of continuous linear function- 
als, this will prove the theorem. 
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The functional L* are defined inductively. We define L\ = if L\ is discontinuous 
on F , and otherwise we take L\ = L\. Observe that in the case L* = the next functional 
L* 2 cannot be chosen adaptively since L*(f) = for all / G F. Therefore, in general, 
L| = L^-, 0) will be different from L 2 = L%{-\ y±), with y\ = Li(f), even if L 2 is continuous. 

Assume now inductively that for all j — 1, 2, . . . , k < n, we have already defined 

L*=L*(-,y* 1 ,y*,...,y*_ l )eA iai with y* = L*{f; y* u y\, . . . , y*^) for all f e F. 

Let L k+ i = L k+1 (-;yl,y2, . . . ,yl) G A al1 be the next linear functional for the original 
information N. Let B k = B k (y*) be defined as above for y = y*. Define also 

A k = A k (y*) = {fEF\ L*(f)=y* for j = 1,2,..., k}. 

If Z/fc+i is continuous, we set L* k+1 = L k+ i, if L k+ i is discontinuous on B k , we define L* k+l = 
0. In the latter case only y k+1 = needs to be further considered. As shown above, 
this completes the definition of N* = (L*,L 2 , ■ ■ ■ ,L^) that consists of n continuous linear 
functionals L*. 

We now show inductively that Bj C Aj and Bj is dense in Aj for all j = 1,2, ... ,n. 
Indeed, for j = 1 we have B\ = A\ if L\ is continuous on F, and 

B 1 = {feF\ L 1 (f) = 0}QA 1 = F 

if L\ is discontinuous on F. From Lemma [1] we know that B\ is dense in A\. 

Assume now that Bj C Aj and Bj is dense in Aj for j = 1,2, ... ,k < n with k > 1. 
Consider first the case when is continuous. Then we have = and 

A k+1 = {fEA k I L* +1 (J) = y* k+1 } 

B k+1 = {feB k \ L k+1 (f)=yl +1 } = {feB k \ L* k+1 (f) = yl +1 }. 

Hence 

A k+1 =A k nC k and B k+1 = B k n C k with C fc = {/ G F | L* k+1 (f) = y* k+1 }. 

Clearly, B k C implies that -Bfc+i C A^+i. We need to show that if B k is dense in A k 
then 5 fc fl C k is dense in A k PI C^. This is obvious if = 0. Assume then that L* k+1 ^ 0. 
Take f £ A k C\ C k . Then for any positive e there exists f £ G -B^ such that ||/ — / e || < e. For 
g k+ i G F with Lj(g k+l ) = for j = 1, 2, . . . , A; and Ljj^flfc+i) = 1, define 

f £ = fe+ {yt+i - L l+l(fe)) 9k+l- 
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Then f £ G B k n C k and since L* k+1 (f) = yl +1 we have 

11/ - /.II < ||/ - Ml + \Ll +1 (f) - L* k+l (f e )\ \\g k+l \\ < ||/ - / £ || (1 + ||L* +1 || ||^ +1 ||) . 

Hence B k PI C k is dense in A k HC k , as needed. 

Consider now the case when L k+1 is discontinuous on B k . Then L* k+1 = and y k+1 = 0. 
We now have 

A k+l = A k and B k+l = B k n C k with C fc = {/ G F | L k+1 (f) = 0}. 

Clearly -Bfc+i C B k C v4 fe = Since is discontinuous on _B fc it is also discontinuous 

on F. Then Lemma [T] says that is a linear subspace which is dense in F. We want to 
show that B k fl C k is dense in = A k+ \. Similarly as before, we take / G A k . For any 
positive e we can find f e G 5^ such that ||/ — / e || < e. If L k+ i(f £ ) = then f £ EB k n C k 
and we are done. Assume then that L k+ i(f e ) ^ 0. We now choose 

g k+ i G B k = kerLi n kerL 2 • • • D kerL fc with L k+1 {g h + x ) = 1. 

Since L k+X is discontinuous on B k , it is also discontinuous on B k , so the element g k+ i can 
be of an arbitrary small norm. We choose a nonzero g k+ i such that ||<7fc+i|| < e/\L k+1 (f e )\. 
Then 

fe = fe— L k+ i(f £ ) g k+ \ 

belongs to B k fl and 

11/ -All < ||/-/ E || + |W/ e )llk+iH <2e. 

Hence, B k fl C k is dense in A k , as needed. This completes the proof that B n = iV _1 (?/*) is 
dense in A n = (N*)- 1 ^*). 

Due to continuity of S, the set B(y*) := {S(f) G G | N(f) = y*, \\f\\ F < 1} is dense in 
A(y*) := {S(f) G G | N*(f) = y*, \\f\\ F < 1} and therefore rad(A(y*)) = iad{B{y*)). This 
holds for all y* G N*(F) and therefore r(iV*) < r(iV), as needed. □ 

The assumption on continuity of S in Theorem [T] is needed. Indeed, assume that 
dim(F) = oo. Then there are discontinuous linear functionals L : F — > K. Define S = L. 
Note that now e all - wor (5) = e all - wor (5) = oo. 

Clearly, the worst case error Ai(f) = L(f) = S(f) is zero. Therefore e all_wor (S') = for 
all n > 1. On the other hand, we know that adaption does not help for linear functionals as 
proved by Bakhvalov. Furthermore, if we use n linear continuous nonadaptive functionals 
Lj then Smolyak's theorem tells us that the best (p in ([T]) is linear, i.e., there are some real 
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numbers aj for which the algorithm A n (f) = X^=i a j^j(f) minimizes the worst case error 
among all algorithms that use iV = (L\, L 2 , . . . , L n ). The results of Bakhvalov and Smolyak 
can be found in [121 EEl EH El- However, S(f) — A n (f) is still a discontinuous linear 
functional and therefore its worst case error is infinite. Since this holds for all continuous 
N, we have ef- wm (S) = oo. Hence, 

ef~ WOT {S) = < e n all - wor (S) = oo for all neN. 

Although Theorem [1] deals with adaptive information, it is known that adaptive information 
does not help for many problems. This holds for linear operators S defined over Hilbert 
spaces F or if S is a linear functional, whereas for linear operators defined over arbitrary 
normed spaces adaption may help at most by a factor of two. The reader may find a survey 
of such results in Chapter 4 of [T5] . 

Remark 1. Seeing the proof of Theorem (U we may think that the use of discontinuous 
linear functionals is useless for approximating continuous S. More precisely, assume that we 
use nonadaptive N = (L±, L 2 , . . . , L n ) for which all L/s are discontinuous linear functionals. 
What is the radius of N? Is it the same as the radius of zero information? This is not true. 
We now show that we can achieve the radius of nonadaptive information consisting of n — 1 
continuous linear functionals. Indeed, let N* = (Llj, L^, . . . , L*) be nonadaptive continuous 
information, L* G A al1 . Take iV = (Li,Li + L 2 , . . . , L\ + L*) with a discontinuous linear 
functional L\. Then all L\ + L*'s are discontinuous. However, if we compute y\ = Li(f) and 
Uj = Lx(f) + L*(f) then we also know L*(f) = yj — yi for j — 2, 3, . . . , n. Hence, we know 
N*(f) and therefore r(N) < r(N*), as claimed. 

It is interesting to see what happens if we apply the proof of Theorem [T] to N. Since 
L\ is discontinuous we obtain L\ = 0. However, L 2 + L\ on B\ = {/ G F \ Li(f) — y{\ 
is y\ + L\ and therefore it is continuous. Then we replace L 2 by L 2 . Similarly, all Lj with 
j > 2 will be replaced by L*. The proof of Theorem []] shows that r(N) = r(N*). 

Remark 2. Assume now that F is a space of functions / : D — > M. We consider the class 
A std of all function evaluations given by the linear functionals L x (f) = f(x) for / G F. Let 
e std-wor^ denote the nth minimal worst case errors of algorithm A n with Lj G A std , i.e., 
algorithms that use at most n function evaluations. 
Assume that L x is discontinuous on 

F fl ker L X1 fl ker L X2 Pi ... Pi ker L Xn 

for all x G D \ {x\,x 2 , . . . ,£„}. For instance F = L 2 (D) fl C(D) equipped with the L 2 (D) 
norm is such an example. 
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Then it is easy to check that the proof of Theorem [T] yields L* = for all j. That is, 

std— wor/Q\ std— wor/ q\ all— wor/ m 

e n W) — e W) — e W)- 

Hence, in this case the use of function evaluations is completely useless. 

However, similarly as in Remark [TJ one can construct examples where all function evalua- 
tions are discontinuous on F but may be continuous on F D kerL Xl PI kerL^ Pi ... Pi kerL Xn , 
and still they are useful. Indeed, consider 



c([o,i]) n / 



f(x) dx 



/(0)+/(l) 



equipped with the L2 norm. Then the integration problem S(f) = f Q f(x) dx is continuous 
and all function evaluations are discontinuous on F. Nevertheless, L\(f) = /(l) = S(f)/2 
on F fl kerLo is continuous, and we can compute S(f) exactly using two function values 
off. 

Remark 3. For a continuous linear S and a Banach space F, Theorem [1] can be proved 
modulo a factor of | by using the known relations between the Gelfand numbers c n (S) and 
the minimal errors e^ 11_wor (5'), see [T3] for a survey of related results. In particular, we use 
that 

c n (S) < ef~™(S) < 2c n (S), 

for any linear and continuous operator S, see [T71 Section 5.4 of Chapter 4]. 

It is known, see [21 Prop. 2.7.5], that the Gelfand numbers c n (S) are local in the sense 
that 

c n (S) = supc n (5'| M ) 

M 

where the supremum is taken over all finite dimensional subspaces M and S\m is the restric- 
tion of S to M. 

Altogether, we obtain the following inequality: 

ef-™(S) < 2c n (S) = 2supc n (S\ M ) < 2 sup e„ all - wor (5| M ) = 2 sup e n all " wor (^| A /) 

M M M 

< 2ef~ WOT (S)i 

i.e. 

e all-wor^ <2-ef- WOV (S). 
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We return to general continuous not necessarily linear operators S. We conjecture that 
the worst case error e^ n ~ WOT (S) is itself a local quantity at least for compact operators S, 

ef~™(S) = sup e n all - wor (5| M ), (3) 

M 

where again the supremum is taken over all finite dimensional subspaces M of F, and S\m is 
the restriction of S to M. This would give another and a much shorter proof of Theorem [TJ 
Indeed, (j3J implies 

e?-™(S) = sup ef-™(S\ M ) = sup e n all - wor (5| M ) < e n all - wor (5), 

M M 

as claimed. 

Although we do not know if (J3J) holds, we easily conclude from the local property of the 
Gelfand numbers that at least the weak local property holds for a continuous linear S and 
a Banach space F, namely 

ef- wor (S) < c n (S) = sup c N (S\ M ) < 2 sup ef- wor (S\ M ). 

M M 

It is also known that the approximation numbers a n (S) are local as long as S is a compact 
operator or G is a dual space, see [21 Prop. 2.7.1 and 2.7.3]. That translates into the fact 
that for linear nonadaptive algorithms in the worst case setting, the use of discontinuous 
functionals does not help. A weak form (with a factor 5) is true for arbitrary G and S, see 
[21 Prop. 2.7.4]. 

Remark 4. Heinrich [5] proves relations between linear n-widths and approximation num- 
bers and shows that they coincide for compact and absolutely convex subsets of a normed 
space. There is also an example showing that, in general, for relatively compact absolutely 
convex sets equality does not hold. The spirit of these results is similar, but there is a dif- 
ference as can be seen from Proposition 1.3 of that paper: The aim is to compare general 
or continuous linear information applied to g — S(f) G G, while we compare general or 
continuous linear information applied to / £ F. 

3 Randomized Setting 

We now deal with randomized algorithms. We consider, as in [T5l Theorem 4.42], only 
measurable algorithms. Hence we use the following definitions. 

A randomized algorithm A is a pair consisting of a probability space (fl, E, jj) and a 
family (N w , (f^^n of mappings such that the following holds: 
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1. For each fixed uj G f2, the mapping A u = (p u o is a deterministic algorithm defined 
as before, based on adaptive information N u consisting of linear functionals from a 
class A. 

2. Let n(f,u) be the cardinality of the information N w for f & F. We assume that the 
function n is measurable. 

3. The mapping (f,u) h-> <Pu(N u (f)) G G is measurable. 

Let A be a randomized algorithm. Then the cardinality of A is defined as 

n(A) = sup / n(f,u)dfi(uj), 
Wf\\ F <-iJn 

whereas the error of A in the randomized setting is 

1/2 

e ran (A) = sup ( / \\S(f)-^(NM))\\ 2 M") 

||/||f<1 V^n 

For n G N, we define the nth minimal error of S in the randomized setting by 

ef- ran (S) = inf{e ran (A) : n{A) < n}, 

if A uses linear functionals from A = A al1 . Similarly we define e n all_ran (5') with A = A al1 . 

Obviously, we can interpret deterministic algorithms as randomized with a singleton Q. 
That is why the nth minimal errors in the randomized setting cannot be larger than the nth 
minima errors in the worst case setting, 

e„ all - ran (5) < e n all - wor (S) and e all - ran (5) < ef- wor (S). 

Basically, there exists only one proof technique to obtain lower bounds for randomized 
algorithms, and this technique goes back to Bakhvalov, see Section 4.3.3 of [T5]. The main 
point is to observe that the errors in the randomized setting cannot be smaller than the 
errors in the average case setting for an arbitrary probability measure on F. 

If we restrict ourselves to measurable randomized algorithms, then 

ef-™(S)>^e^(2n, Q ), (4) 

where e avg (2n, g) denotes the 2n-th minimal average case error of deterministic algorithms 
that use at most In linear functionals from A al1 and g is an arbitrary (Borel) probability 
measure on F, for more details see Lemma 4.37 and Remark 4.41 of [15J. The next result is 
identical with Theorem 4.42 from [15] , see also [13]. It is stated here with a proof since we 
need a small modification of this result. 
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Lemma 2. Assume that S : F — > G is a compact linear operator between Hilbert spaces F 
and G. Then 



1 „all— wor/ q\ all— ran/ q\ 

2 e 4n-l l°J — e n 



Proof. We know that S'(ej) = c^ei with orthonormal {e^} in F and {e^} in G. Here, {cr,;} 
is a sequence of non-increasing singular values cr, of S 1 and lim, cij = 0. We also know 
that e n all_wor (S') = a n+ \. For m > n, consider the normed (m — l)-dimensional Lebesgue 
measure g m on the unit sphere E m = {XX=i a % e i '■ OiGR, Y^Li a i = !}• Then 



(oo \ n 

i=l ' i=l 



o~i(y.iCi 



is the optimal algorithm using continuous linear information of cardinality n. This is true for 
the worst case setting, with error a n+ ±, as well as for the average case setting with respect 
to g m . Hence 



3 avg/ 



n III 

{n,Q m ) 2 = j ^2 (y 'i a2 idQm{0t). 
J E m i=n+ i 

Since J E af dg m (a) = 1/m we obtain 

.. m 

e^(n,g m ) 2 = - £ ^. 

i=n+l 

If we put m = 2n then we obtain 

e avg (n, g 2n ) >\V2a 2n . 

Together with (TJJ, we obtain 

e n all ~ ran (5) > \ V2e^(2n,g 4n ) >\a in = \et~-T{S). 



□ 



We stress that in the proof we only use finite dimensional subspaces of F and linear 
functionals on such subspaces. Of course, for finite dimensional spaces, we have A al1 = A al1 
and therefore we obtain the following result. 

Theorem 2. Assume that S : F — > G is a compact operator between Hilbert spaces. Then 

1 all— wor / q\ ^ ~aX\— ran /on ^ all— wor/r^ 
2 e 4n-l W) e n W ) £ n ) . 
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In this sense, randomization as well as allowing discontinuous linear functionals from A al1 does 
not (essentially) help for the approximation of compact operators between Hilbert spaces. 

Remark 5. Assume that S : F — > G is linear and F and G are normed spaces. We do not 
know whether 

lim e n all - ran (5) = 

n— >oo 

implies that S is compact. It is shown in [9] that the embedding I : l\ — > is a universal 
non-compact operator in the sense that it factors through any non-compact linear bounded 
operator S : F — > G between Banach spaces F and G: 

F G 



v 



u 



Here, I = USV for some linear bounded operators U and V. It follows that 

e n all " ran (J) = e,f-™ n (USV) < \\U\\ e n all - ran (5) ||V||. 
Thus it is sufficient to decide whether 

lim e n all " ran (J) > or lim ef l ~ van {I) = 0. 



Remark 6. Also in the randomized setting it would be very interesting to know whether 
the error e all_ran (S') is a local quantity, i.e. whether 

ef-^(S) = sup e n all - ran (S| M ), 

M 

or at least 

e all-ran (s) < Q ^ e f-™( S \ M ) 
M 

for some constant c independent of n. This would lead to an analogue of Theorem [T] in the 
randomized setting. 

Remark 7. It is interesting to mention that there is a continuous nonlinear operator S : 
F — > G for normed spaces F and G which is solvable in the randomized setting but not in 
the worst case setting, i.e., 

lime n aU - wor (5') = lime„ all - wor (S) > and lime all ~ ran (S) = 0. 

n n n 
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Obviously, the first equality above follows from Theorem [TJ Indeed, let 

S : F := £1 -> G := R, S'(x) = for all ar G F. 

Clearly, 5 is a continuous nonlinear functional. Note that the constant algorithm ~ has the 
worst case error ~ since we have S(x) G [0, 1] for x from the unit ball of l\. Let 



lim ef l - wor (S). 

n— >oo 



Then c<\. We now show that c > 0. We will use the known result of Kashin about the 
Gelfand width c n (B™, P£) for the unit ball B™ of x G IR m with ||a;||i < 1, and with the error 
measured in the £2 norm. Namely, there are two positive numbers and such that for 
all n < m we have 

. / \n(m/n) + 1\ 1/2 , Dm w .„ . A ln(m/n) + 1 \ 1/2 
ci )2 mm 1, —-^ < c n (B™, I™) < C 1)2 mm 1, 1 



n J \ n 

This means that lim m ^ 00 c n (B™,£2) > 01,2. Using the definition of the Gelfand width, we 
conclude that 

lim inf sup ||x|| 2 > C\^. 

n ^°° L 1 ,L 2 ,...,L n eA a11 x&U Lj(x)=0,j=l,2,...,n, ||a;||i<l 

Take now arbitrary adaptive iV = (L l5 L 2 , . . . , L n ) with linear functionals Lj. Then there 
exists x in the unit ball of i\ such that Lj(x) = for j = 1,2, ... ,n and ||x||2 > c i,2/2. 
Let a G [0,1]. Then ax belongs to the unit ball of i\ and N(ax) = 0. For an arbitrary 
algorithm A = <p n (N(-)) we have A n (ax) = <p n (0) and S(ax) — <p n (0) = ct||a;||| — y?„(0). 
Therefore 

e(A n ) > max \a\\x\\l - (p n {0)\ > h\\ x \\l > i c i,2- 

a6[0,l] 

Hence e^ u ~ wor (S') > |ci )2 for all n. This proves that c > 0, as claimed. 
In the randomized setting, consider the (random) linear functional 



L{x) = £>) 



x k 



k=l 



with random and independent signs of probability \. The variance of the random variable 
L(x) is a 2 (L(x)) = \\x\\l and it can be easily estimated with independent copies Lj of L. It 
is well known that the "empirical variance" 



j n / n 

A ^ x ) = — l H[ L *-Y. L * 



1=1 \ k=i 
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with independent copies Lj of L has expectation S(x) 



2 and variance 



o 2 {A n (x)) < - ■ 




x 



1 



Therefore 



e ran (A0 



sup a(A n (x)) 

IMUi<l 




and lim n e: 



(S 1 ) = 0, as claimed. 



4 Function Values 



So far we did not assume that F is a space of functions and we only compared continuous 
linear information with arbitrary linear information. Now we assume that F is a normed 
space of functions / : D — > K. for some nonempty set D. 

Let L x (f) = f(x) for all / G F and x <E D. Since /(a?) is well defined for all / G F 
and x E D, the functionals L x 's are linear but not necessarily continuous. This class of 
information is called standard and denoted by A std . Obviously A std C A aU , however, A std C 
A al1 only if all L x s are continuous. Still Theorems [1] and |2] apply in this case. 

We now consider a more general case when F is a space of equivalence classes of functions 
/ : D — y R. A major example is F = L2(D). Then L x is not even well defined for / G F. 
On the other hand, we know that for F = L,2(D) the functional L x (f) = f(x) is well defined 
for each / G / and all x from D. Here / G / means that the well defined function / is in the 
equivalence class /. This type of information is successfully used for multivariate integration 
by the standard Monte Carlo algorithm M n in the randomized setting. Here for D = [0, l] d , 
we have M n (f) = n~ l YTj=i f( x j) f° r independent and uniformly distributed points Xj. Then 



if A) f2 ^ / and one usually uses only algorithms with this property. We would like to extend 
the analysis presented in the previous section also for the case when the elements of F are 
equivalence classes of functions. 

We argue as follows. As in Section [3], we assume that S : F — > G is a compact linear 
operator between the Hilbert spaces F and G. We know that then 



M n {h) = M n (f 2 ) 



a.s. 



S(ei) = (Tie- 
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with a non- increasing sequence of singular values <7j of S, linij <jj = 0, with orthonormal {e^} 
in F and orthonormal {e^} in G. We also know that wox (S) = o~ n+1 . Then 



(oo \ n 

E a * e * ) = E 
i=l ' i=l 



is an optimal algorithm using continuous linear information of cardinality n. This is also 
true if we replace F by the n + k dimensional space V n+ k = span(ei, e 2 , . . . e n+ k). Hence 

~all-wor (K+fe) = e all-wor (K+fc) = ^ = e all-wor ^ for ^ fc > L 

Suppose that the functions are elements of the equivalence classes e,, for alH = 1, 2, . . . , n+ 
k. Then we have functions in V n+ k = span(ei, £2, . . . e„ + fc) that are well defined everywhere. 
With this assumption we only make the oracle more powerful, i.e., the lower bound is even 
stronger. In this sense we can think of A std as a subset of A al1 . By e n std_wor (S') we denote the n 
minimal worst case errors of algorithms that use at most n function values for approximating 
S over V n+ k- We obtain the following corollary. 

Corollary 1. Assume that S : F — > G is a compact linear operator between Hilbert spaces 
F and G. Then 

e n std - wor (S) > ef- wor (S) for all n G N. 

The same can be said for randomized algorithms. In this case we take k > 3n, update 
the definition of e n std ~ ran (5') and obtain the following corollary. 

Corollary 2. Assume that S : F — > G is a compact linear operator between Hilbert spaces 
F and G. Then 

std— ran/ m \ 1 all— wor j n\ 
d n V D ) — 2 e 4n-l \°)- 

Remark 8. Note that we did not compare e^ td_ran (S') with e^ d_wor (S'). We stress that 
randomization may help a lot for the class A std . This holds if function evaluations are 
continuous and also if function evaluations are not continuous. Examples for both cases can 
be found in Chapter 17 of [16J. 

A major example is the embedding of a function space into another (larger) function 
space. The literature is very rich, see, e.g., [H El El 13 El QUI QH El E51 D3 ESJ. One can 
study the classes A al1 , A al1 as well as A std in the worst case setting and in the randomized 
setting. In the randomized setting we do not know whether A al1 and A al1 always lead to the 
same results since in Theorem [2] we assume that both F and G are Hilbert spaces. It is open 
what happens for general normed spaces F and G and if an analogue of Theorem [1] for the 
worst case setting also holds in the randomized setting. 
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We also add that the lower bounds in the randomized setting of Heinrich and Mathe for 
specific spaces F and G are valid not only for A al1 but also for A al1 , see [SI El El EE]- The 
reason is similar as above in the proof of Theorem [2j Namely, finite dimensional subspaces 
of F can be used for the lower bounds and here all linear functionals are continuous. 

Acknowledgement: We thank Stefan Heinrich for valuable remarks. 
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